Similarity Measures in Documents Using Association Graphs

نویسندگان

  • José Eladio Medina-Pagola
  • Ernesto Martínez
  • José Hernández Palancar
  • Abdel Hechavarría Díaz
  • Raudel Hernández-León
چکیده

In this paper we present a new model, designated as Association Graph, to improve document representation, facilitating the ontological dimension. We explain how to generate and use this kind of graph. Also, we analyze different document similarity measures based on this representation. A classical vector space model was used to evaluate this model and measures, investigating their strengths and weaknesses. The proposed model was found to give promising results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...

متن کامل

Microsoft Word - CONTENTS-AUGUST07

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...

متن کامل

بررسی قابلیت بهکارگیری سنجه های مرکزیت به عنوان شاخصهای ارتباط استنادی مدارک در بازیابی اطلاعات رابطه ای: مطالعۀ مقدماتی

Purpose: this is a pilot study tends to investigate correlation between centrality measures with bibliographic coupling as a well-known citation-based document similarity measure.  Methodology: using citation analysis method, 40 research articles belonging to four engineering/pure disciplines (Physics, Chemistry, Biology, and computer) and four Humanities and Social disciplines (Economics, Edu...

متن کامل

طراحی سامانۀ تشخیص دستبرد ادبی جمله‌بنیاد در متون فارسی به کمک هم‌جوشی گواه‌ها

Today, there are many documents on Internet, such that users can generate new documents by coping them and existing Plagiarism Detection systems (PDS) couldn't detect all kind of plagiarism. The main challenge is finding a suitable algorithm to improving the amount of similar documents and their assessing time. It’s difficult to do assessing similarity in Persian texts that different characteri...

متن کامل

A Graphical Framework For Contextual Search And Name Disambiguation In Email

Similarity measures for text have historically been an important tool for solving information retrieval problems. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a struct...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005